[SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown #29092

tanelk · 2020-07-13T20:59:10Z

What changes were proposed in this pull request?

Add And(IsNotNull(e), GreaterThan(Size(e), Literal(0))) filter before Explode, PosExplode and Inline, when outer = false.
Removed unused InferFiltersFromConstraints from operatorOptimizationRuleSet to avoid confusion that happened during the review process.

Why are the changes needed?

Predicate pushdown will be able to move this new filter down through joins and into data sources for performance improvement.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

tanelk · 2020-09-16T19:39:15Z

cc @maropu @dongjoon-hyun

Also, I messed up one commit, that made the bot add wrong labels.

maropu · 2020-09-17T01:35:56Z

ok to test

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

SparkQA · 2020-09-17T07:02:59Z

Test build #128783 has finished for PR 29092 at commit ffe43d7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tanelk · 2020-09-29T18:46:11Z

Pinging @viirya and @cloud-fan for possible second review

SparkQA · 2020-09-29T19:32:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33866/

SparkQA · 2020-09-29T19:48:45Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33866/

SparkQA · 2020-10-07T15:30:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34119/

SparkQA · 2020-10-07T15:54:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34119/

SparkQA · 2020-10-07T16:24:17Z

Test build #129514 has finished for PR 29092 at commit a25bd6f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-08T04:33:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34144/

SparkQA · 2020-10-08T05:00:19Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34144/

SparkQA · 2020-10-08T07:05:02Z

Test build #129538 has finished for PR 29092 at commit d4089ca.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-12T15:17:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34301/

SparkQA · 2020-10-12T15:40:02Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34301/

SparkQA · 2020-10-12T19:00:17Z

Test build #129695 has finished for PR 29092 at commit e3a0205.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-10-12T23:50:52Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

        InferFiltersFromConstraints) ::
      Batch("Operator Optimization after Inferring Filters", fixedPoint,
-        rulesWithoutInferFiltersFromConstraints: _*) ::
+        operatorOptimizationRuleSet: _*) ::


nit: operatorOptimizationRuleSet -> rulesWithoutInferFilters?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

maropu · 2020-10-12T23:59:02Z

Looks okay otherwise.

SparkQA · 2020-10-13T03:01:15Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34326/

SparkQA · 2020-10-13T03:25:14Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34326/

SparkQA · 2020-10-13T06:50:03Z

Test build #129720 has finished for PR 29092 at commit b7a7c49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

...t/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala

SparkQA · 2020-10-13T10:15:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34344/

SparkQA · 2020-10-13T10:43:26Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34344/

maropu · 2020-10-13T11:11:35Z

GA passed. Thanks! Merged to master.

SparkQA · 2020-10-13T13:55:01Z

Test build #129738 has finished for PR 29092 at commit 857c007.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-10-13T16:11:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

+      // Exclude child's constraints to guarantee idempotency
+      val inferredFilters = ExpressionSet(
+        Seq(
+          GreaterThan(Size(g.children.head), Literal(0)),


This can be useful to filter rows before join, but can we pushdown Size expression to datasource?

I don't think so...

probot-autolabeler bot added the SQL label Jul 13, 2020

probot-autolabeler bot added BUILD CORE INFRA PYTHON R WINDOWS labels Sep 13, 2020

tanelk closed this Sep 13, 2020

tanelk deleted the SPARK-32295 branch September 13, 2020 07:11

tanelk restored the SPARK-32295 branch September 16, 2020 18:16

tanelk reopened this Sep 16, 2020

tanelk closed this Sep 16, 2020

tanelk force-pushed the SPARK-32295 branch from 906f941 to 2e3aa2f Compare September 16, 2020 18:20

InferFiltersFromGenerate

8cfa6af

tanelk reopened this Sep 16, 2020

Don't infer for foldable

ffe43d7

maropu removed BUILD CORE INFRA PYTHON R WINDOWS labels Sep 17, 2020

maropu reviewed Sep 17, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala Outdated Show resolved Hide resolved

style

3fc4b12

tanelk added 2 commits October 8, 2020 06:39

Restore foldable check

3bbff2c

Add explanaiton

d4089ca

Clean up operatorOptimizationRuleSet

e3a0205

maropu reviewed Oct 12, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala Outdated Show resolved Hide resolved

maropu changed the title ~~[SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown~~ [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown Oct 12, 2020

Better comment

b7a7c49

cloud-fan reviewed Oct 13, 2020

View reviewed changes

...t/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Oct 13, 2020

View reviewed changes

Remove redundant code from UT

857c007

maropu approved these changes Oct 13, 2020

View reviewed changes

maropu closed this in 17eebd7 Oct 13, 2020

viirya reviewed Oct 13, 2020

View reviewed changes

[SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown #29092

[SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown #29092

Uh oh!

Conversation

tanelk commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

tanelk commented Sep 16, 2020

Uh oh!

maropu commented Sep 17, 2020

Uh oh!

Uh oh!

SparkQA commented Sep 17, 2020

Uh oh!

tanelk commented Sep 29, 2020

Uh oh!

SparkQA commented Sep 29, 2020

Uh oh!

SparkQA commented Sep 29, 2020

Uh oh!

SparkQA commented Oct 7, 2020

Uh oh!

SparkQA commented Oct 7, 2020

Uh oh!

SparkQA commented Oct 7, 2020

Uh oh!

SparkQA commented Oct 8, 2020

Uh oh!

SparkQA commented Oct 8, 2020

Uh oh!

SparkQA commented Oct 8, 2020

Uh oh!

SparkQA commented Oct 12, 2020

Uh oh!

SparkQA commented Oct 12, 2020

Uh oh!

SparkQA commented Oct 12, 2020

Uh oh!

maropu Oct 12, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maropu commented Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 13, 2020

Uh oh!

SparkQA commented Oct 13, 2020

Uh oh!

SparkQA commented Oct 13, 2020

Uh oh!

Uh oh!

SparkQA commented Oct 13, 2020

Uh oh!

SparkQA commented Oct 13, 2020

Uh oh!

maropu commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 13, 2020

Uh oh!

viirya Oct 13, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Oct 13, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tanelk commented Jul 13, 2020 •

edited

Loading

maropu commented Oct 12, 2020 •

edited

Loading

maropu commented Oct 13, 2020 •

edited

Loading